• 検索結果がありません。

Solving Extremely Large Stochastic Mixed-Integer

Background and Purpose

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

!$!7F32<B353@ 7<3/@$@=5@/;;7<5 ;7<7;7H3A=@;/F7;7H3A/:7<3/@4C<1B7=<

7AAC0831BB=:7<3/@1=<AB@/7<BA 6/A7<B353@/<21=<B7<C=CAD/@7/0:3A

/ General form of

1=;07</B=@7/:=>B7;7H/B7=<>@=0:3;A / !/<G/>>:71/B7=<A

/ !$A=:D/07:7BG6/A033<7;>@=D7<5

min{c

x : Axb, lxu, x

j

∈ Z, for all jI}

A ∈ R

m×n

, b ∈ R

m

, c, l, u ∈ R

n

, I ⊆ { 1, . . . , n } min{c

x : Axb, lxu, x

j

∈ Z, for all jI}

A ∈ R

m×n

, b ∈ R

m

, c, l, u ∈ R

n

, I ⊆ { 1, . . . , n }

'7'.12/'051(#/#44+7'.:2#3#..'.41.7'3

1/<A=:D37<AB/<13A B6/B 1/<<=B 03 A=:D32 0G AB/B3=4B63/@B!$ A=:D3@A 933>A1/B167<5C>>3@4=@;/<137;>@=D3;3<BA=4AB/B3=4B63/@B!$A=:D3@A

Stochastic programming models optimization problems involving uncertainty An approach: Optimize minimum expected cost

Typically: Decisions are staged

Consider two-stage stochastic mixed-integer programs (SMIPs):

1st stage: deterministic “now” decisions

2nd stage: depends on random event & first stage decisions.

Cost function includes deterministic variables & expected value function of non-deterministic parameters

Stochastic Mixed Integer Programming

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

3F/;>:3

Using sample average approximation (SAA), can be approximated using deterministic equivalent formulations

This assumption yields characteristic dual block-angular structure.

Stochastic MIPs and their deterministic equivalent

A T1W1 T2 W2

..

. . . .

TN WN

minctx

x0 x1 x2 .. . xN

b0 b1 b2 .. . bN

s.t.

Common constraints Independent realization scenarios

}

A T1

T2

TN

... ...

W1

W2

WN

1.+0-+0)%10453#+054 *'4#/'$.1%-4+;'

PIPS-SBB: a distributed-memory parallel solver for deterministic equivalent SMIPs

PIPS-SBB is a specialized Branch-and-Bound based solver for two-stage SMIPs Distributed Memory approach: Distribute data among multiple processors.

PIPS-S: Distributed Memory parallel Simplex solver for Stochastic LPs (Lubin et al., 2013) PIPS-SBB: SMIP-specific cuts, heuristics, branching rules, etc.

Current progress: CP18 Integer and Combinatorial Optimization – Part I (Wed. 17:30) PIPS-SBB: A distributed-Memory Linear-Programming-Based Branch-and-Bound Solver for Stochastic Mixed-Integer Programs given by Geoffrey Malcolm Oxberry

...

A T1 T2

TN

... ...

...

W1 W2

WN

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

Branch and Bound for solving MIPs

Branch and Bound (B&B): Most widely used algorithms for solving MIPs to optimality Upper Bounds (UB) are provided by the integer solutions found along the B&B

exploration

Each subproblem has a Lower Bound (LB) associated with the LP relaxation Upper bound (UB)

Lower bound (LB) UBLB

UB ·100 GAP(%)

| |

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

Branch and Bound is straightforward to parallelize in theory

Processing of subproblems is independent Shared-memory parallelization present in most

state-of-the-art MIP solvers

Distributed-memory parallelization is not as straightforward Need to distribute work among different nodes

Information sharing results in communication overheads Classification: Smallest transferrable unit of work

Coarse-grained parallelization: Unit of work is MIP subtree Fine-grained parallelization: Unit of work is MIP tree node

Branch and Bound is straightforward to parallelize in theory

Processing of subproblems is independent

Processing of nodes is a sequential computational bottleneck

PIPS-SBB: B&B based Stochastic MIP solver Processing of nodes in parallel (parallel LP relaxation,

parallel heuristics, parallel problem branching, …).

...

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

'44#)'#44+0)05'3(#%'

Branch and Bound is straightforward to parallelize in theory

Processing of subproblems is independent

Processing of nodes is a sequential computational bottleneck

PIPS-SBB: B&B based Stochastic MIP solver Processing of nodes in parallel (parallel LP relaxation,

parallel heuristics, parallel problem branching, …).

Parallelized B&BPIPS-SBB: Two levels of parallelism Branch and Bound in parallel (coarse/fine)

Coarse-grained: ug[PIPS-SBB, MPI]

Fine-grained: PIPS-PSBB

...

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

'44#)'#44+0)05'3(#%'

Parallel Branch and Bound as a graph problem

We can regard parallel Branch and Bound as a parallel graph exploration problem Given p processors, frontierof a tree is the set of subproblems currently being open Subset currently processed in parallel are the activenodes

Redundantnode: A subproblem fathomable given that the optimal solution is known.

Goal: Increase efficiency of Parallel B&B by reducing number of useless nodes explored.

Parallel Branch and Bound

To reduce the amount of useless nodes explored, the search must Fathom subproblems using high quality UBs

How to focus on the most promising nodes To increase the parallel efficiency by:

Generating a set of active nodes comprised of the most promising nodes Employing processors to explore the smallest amount of active nodes With less communication overhead

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

Coarse-grained Parallel Branch and Bound: MIP subtrees

Advantages:

Less communication between MPI processes Centralized implementations simpler to implement Challenges:

Extra work likely to be performed when compared to the sequential case May scale poorly due to load imbalance

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

Course-grained approach to Parallel Branch and Bound using UG

Ubiquity Generator Framework(UG) is a generic framework to parallelize B&B solvers Exploits powerful performance of state-of-the-art base solvers, such as SCIP, Xpress, Gurobi, CPLEX External parallelization using Sub-MIPs (Unit of work: MIP subtree)

Solving techniques involved in SCIP

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

/ MILP (Mixed Integer Linear Programming) Problem

/ Parallel solving with mathematically supercharged algorithms

/ Finite, but exponential search space

B&B: implicit enumeration for global optimality / UG: parallelization framework for state-of-the-art

B&B algorithms (e.g. for MINLP, etc.) / Parallelize SCIP: Solving Constraint Integer Programs / Parallelize commercial solvers

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

min{cx:Axb, lxu, xj∈Z,for alljI}

A∈Rm×n, b∈Rm, c, l, u∈Rn, I⊆ {1, . . . , n}

A#@757</:

AC0>@=0:3; A’$@3A=:D32 AC0>@=0:3;

min{cx:Axb, lxu, xj∈Z,for alljI}

A∈Rm×n, b∈Rm, c, l, u∈Rn, I⊆ {{1, . . . , n}}

exponential sea 74471C:B G</;71:=/2

0/:/<17<5

UG Ubiquity Generator Framework - 1

UG Ubiquity Generator Framework - 2

!(3#/'813-)A7<5$B= 1=<B@=:

A=:D7<5/:5=@7B6;A )A7<5$B=1=<B@=:

A=:D7<5/:5=@7B6;A )A7<5$B=1=<B@=:

A=:D7<5/:5=@7B6;A

/A3A=:D3@ /A3A=:D3@ /A3A=:D3@

)A7<5=@25*3'#&4

4=@1=;;C<71/B7=<A )A7<5=@25*3'#&4

4=@1=;;C<71/B7=<A )A7<5=@25*3'#&4 4=@1=;;C<71/B7=<A A6/@32;3;=@G

C5,25*3'#&4-703@'$

C5,"23'4425*3'#&4-703@*>@3AA

=/2A/@31==@27</B320G/A>317/:>@=13AA=@B6@3/2 /A3A=:D3@

#>@3A=:D3

27AB@70CB32;3;=@G C5,-$/@/'$

C5,"23'44-$/@/*>@3AA

#3#..'.

1.7'3 045#05+#5+10 =/2=@@27</B=@

)A7<5$B=1=<B@=:

A=:D7<5/:5=@7B6;A

/A3A=:D3@

)A7<5=@25*3'#&4 4=@ 1=;;C<71/B7=<A

!1.7'3

95'30#.

2#3#..'.+;#5+10

UG Ubiquity Generator Framework - 3

Main features provided by UG Ramp-up mechanism

Normal ramp-up, racing ramp-up Dynamic load balancing

Checkpointing and restarting mechanism Deterministic mode for debugging

Some facts about ug[SCIP, MPI]: ParaSCIP

Solved 2 MIPLIB2003 previously unsolved instances for the first time Solved 12 MIPLIB2010 previously unsolved instances for the first time ug[SCIP-Jack, MPI] solve three previously unsolved instances

from PUC Steiner Tree Problem test set

SCIP-Jack: A customized SCIP solver for Steiner Tree Problem

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

Main results for MIP solving by ParaSCIP

"=D3;03@B@7>B7;E/AA=:D32E7B61=@3A7<6=C@A 313;03@@;7<3E/AA=:D32E7B6@3AB/@B328=0A

(7B/<@/G*#>B3@=< H@/G3;7<77<B3@1=<<31B '!C87BAC$&!&+&*'

&"':B7F **3=< %H *H &"@/G*<B3:*3=<DH@73A7<B3@1=<<31B

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

0303@ 3333333 /A A= 32 B 3AB/ B32 8=0A

B/< @/G*#>B3@=< H @/G3;7<77<B3@1=<<31B '! C87BAC $&!&+ &*'

AA=:D32E7B61=@3A7<6=C@A 3

3 3 3

3E/AA=:D32E7B6@3AB/@B328=0A

The biggest and the longest computation

Solving rmine10: 48 restarted runs with 6,144 to 80,000 cores

1 100 10000 1x106 1x108 1x1010

0 10000 20000 30000 40000 50000 60000 70000 80000 90000

Number of Nodes Number of Active Solvers + 1

# nodes left

# active solvers

# nodes in check-point file -1916

-1915.5 -1915 -1914.5 -1914 -1913.5 -1913 -1912.5 -1912

0 1x106 2x106 3x106 4x106 5x106 6x106 7x106

Objective Function Value

Computing Time (sec.) Incumbents

Global LBs

=EC>>3@/<2:=E3@0=C<2A3D=:D32

80000 90000

+ 1# nodes left

# active solvers

# d i h k i t fil x106 2x106 3x106 4x106 5x106 6x106 7x106

Computing Time (sec.) Incumbents

Global LBs

EC>>3@/<2:=E3@0=C<2A3D=:D32

(7B/<E7B61=@3A (63=B63@A &"

5511-#$165&#:4

#0&:'#341(

!%13'*1634

*#0&.'62!%#0

51

231%'44

Performance of state-of-the-art MIP Solvers

!$A=:D3@03<16;/@9B6@3/2'674B3253=;3B@71;3/<=4@3AC:BAB/93<4@=;

B636=;3>/53=4 /<A!7BB3:;/<< >@ )<A=:D32=@4/7:327<AB/<13A /@3/11=C<B324=@E7B6B63B7;3:7;7B=46=C@A

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

>3@4=@;/<13C53 27443@3<13

'$'=:D7<5=<AB@/7<B<B353@$@=5@/;A

=<3=4B63;=AB>=E3@4C:<=<1=;;3@17/:!$A=:D3@

extendable to solve more large classes of

=>B7;7H/B7=<>@=0:3;A

B6@3/2

Solving open instances of MIPLIB2010

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

+3/@

"C;03@=4=>3<7<AB/<13AA=:D32 6BB> ;7>:70H7023

=>3<7<AB/<13A:34B

!$ E/A

>C0:7A632

&" &"

1=@3A1=@3A

"3E$/@/*>@3AA

Course-grained approach to Parallel Branch and Bound using UG

Ubiquity Generator Framework(UG) is a generic framework to parallelize B&B solvers Exploits powerful performance of state-of-the-art base solvers, such as SCIP, Xpress, Gurobi, CPLEX External parallelization using Sub-MIPs (Unit of work: MIP subtree)

Base solver and communication library used are encapsulatedby UG:

ug[PIPS-SBB,MPI]

Course-grained approach to Parallel Branch and Bound using UG

Dynamic load balancing using Lower Bounds (quality of “goodness”):

LoadCoordinator (LC) maintains queue of “good” nodes LC Requests nodes from “good” solvers when queue size is small LC hands out node from queue to solver when requested Solver requests “good” node when current nodes are not “good”

Other details:

Multiple Ramp-up mechanisms: Normal ramp-up, racing ramp-up Checkpointing and restarting mechanism

Deterministic mode for debugging

Capable of handling distributed-memory based base solver

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

Course-grained approach to Parallel Branch and Bound using UG

mpirun –np 10 ./ugpipssbbSMPS … -npipsbk 3 E/7B7<5

@C<<7<5 .1$#.+0%6/$'05

41.65+10

1#&113&+0#513#0-!1.7'3

#0- !1.7'3

#0- !1.7'3

#0-$$''&/<9

$$''&/<9

$$''&/<9

$$''&/<9

$$''&/<9

$$''&/<9

$$''&/<9

$$''&/<9

$$''&/<9

'&#0-!41.7'33#0-$$'' &/<9

$$''&/<9

$$''&/<9

'C0!$1=;>CB/B7=<7A>3@4=@;32E7B6;C:B7>:3!$>@=13AA3A

"332B=AG<16@=<7H3E63</)'=:D3@1=;;C<71/B3E7B6 k 3

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

Fine-grained Parallel Branch and Bound: MIP tree nodes

The smallest transferrable unit of work is a Branch and Bound node.

Queues in processors are collections of subtrees (frontier nodes) Advantage: Allows for great flexibility and fine-grained control

Disadvantage: More communication

Solution: Coordination and exchange is decentralized to minimize communication

Fine-grained control: Share the most promising nodes Solution: All-to-all parallel node exchange of best active nodes

Load balancing is maintained via synchronous MPI collective communications

Lower boundsof most promising K x N nodes of all processors are exchanged and ranked

The top K x N are selected and nodesare redistributed in a round robin fashion Because of fine-grained nature of the

approach, communication must be used strategically to minimize overheads Status of each solver (Upper/lower bounds, tree

sizes, times, solutions, …) exchanged asynchronously

Node transfers are synchronous Exchanges are triggered judiciously

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

0 1 2 3 5 6 8 15 16 4 7 9 10

12

11 13 14 17 18 19 Solver 0

0 1 2 3 5 6 8 15 16 4 7 9 10

12

11 13 14 17 18 19 Solver 2

0 1 2 3 5 6 8 15 16 4 7 9 10

12

11 13 14 17 18 19 Solver 0

0 1 2 3 5 6 8 15 16 4 7 9 10

12

11 13 14 17 18 19 Solver 1

0 1 2 3 4 5 6 7 8

Solver 0

0 1 2 3

Solver 1 Solver 2

5 6 8

4 7 9 10 11 12

14 13

Gather top K · N · N bounds (K nodes · N solvers · N solvers)

Solver 0

0 3 6

Solver 1 Solver 2

2 5 8

11 10 12 Redistribution of

top K · N nodes

1 4 7 9

13 14

16 20 21

15 22

18

17 19

Sort, and select top K · N bounds

K=3, N=3

16 20 21

15 22

18

17 19

Solver 2

0 1 2 3 4 5 6 7 8 Solver 1

0 1 2 3 4 5 6 7 8 Solver 0

0 1 2 3 5 6 8 15 16 4 7 9 10

12

11 13 14 17 18 19 Solver 0

0 1 2 3 5 6 8 15 16 4 7 9 10

12

11 13 14 17 18 19 Solver 0

n Node estimation/bound n Node information

Fine-grained approach to Parallel Branch and Bound

Two levels of parallelism require a layered organization of MPI processors In the PIPS-S communicator,

processors perform in parallel:

Presolve, LP relaxations

Primal Heuristics, Cuts, Branching and Node Selection

In the Branch and Bound communicator, processors exchange:

B&B Nodes, Lower Bound information Solutions, Queue sizes, and Search status Two Strategies for Ramp-up:

Parallel Strong Branching Standard Branch and Bound

Strategy for Ramp-down: intensify the frequency of node rebalancing

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

We test our solver on SSLP instances, from the SIPLIB library SSLP instances model server locations under uncertainty

Instances are coded as SSLPm_n_s, where s represents the number of scenarios Larger number of scenarios means bigger problems

LP relaxations of all instances fit in memory, even in CPLEX PIPS-SBB can handle much larger LP relaxations

Details: See http://www2.isye.gatech.edu/~sahmed/siplib/sslp/sslp.html Larger number of scenarios means bigger problems

LP relaxations of all instances fit in memory, even in CPLEX PIPS-SBB can handle much larger LP relaxations

Computationally intensive features MIP Heuristics: on, Strong Branching: off All runs on the Cab cluster at LLNL:

Each node: Intel Xeon E5-2670, 2.6 GHz, 2 CPUs x 8 cores/CPU 16 cores/node, 2 GB RAM/core, 32 GB RAM/node Infiniband QDR interconnect

Experimental performance results: Setup

We measure parallel performance in terms of wall clock time, speedup, communication overhead, number of nodes processed and primal dual integral

Wall clock time

Speedup: Fraction of time Tpneeded to reach optimality by a configuration with p processors with respect to the time needed by a sequential baseline T1:

Communication overhead (Parallel PIPS-SBB): Fraction of time Tcomm+ Tsyncneeded for communication and processor synchronization with respect to the total time of execution Texec:

Idle time ratio (ug[PIPS-SBB,MPI]): Fraction of average idle time with respect to execution Texec:

Node efficiency:Fraction of redundant nodes explored Nr with respect to the total number of nodes explored Ntotal.

Experimental performance results:

Measuring parallel performance

$

$ "

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

2:3B7;3@/B7=$#

!

$

$+(%(#$%#&(!)%$

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

Performance comparison between PIPS-PSBB and ug[PIPS-SBB,MPI] when optimizing small instances.

sslp_15_45_5 (5 scenarios, 3390 binary variables, 301 constraints)

$$'$'

o '1/:3AC>B=1=@3AF o (=B/:E=@9>3@4=@;32@3;/7<A

E7B67</4/1B=@=4FE@B A3?C3<B7/:

o =;;C<71/B7=<=D3@63/2 2=;7</B3A/4B3@1=@3A o "=237<3447173<1G5@=EA/B/

A:=E3@@/B3B6/<C5,$$' '!$- C5,$$''!$-o A/4/1B=@=4FA:=E3@

o (=B/:E=@9D/@73A0G>@=13AA=@

1=<475C@/B7=<

o 7563@1=;;C<71/B7=<=D3@63/2 /<267563@<=237<3447173<1G 0

10 20 30 40 50 60 70

Communicationoverhead(%)

1 2 10 50 100200400

Number of Processors 0

10 20 30 40 50 60 70

Scaling:Timetooptimality

1 2 10 50 100 200400

Number of Processors

100000 200000 300000 400000 500000 600000 700000

Totaltreesize

1 2 10 50 100 200400

Number of Processors 0 10 20 30 40 50 60 70

Nodeinefficiency(%)

1 2 10 50 100200400

Number of Processors

PIPS-PSBB ug[PIPS-SBB,MPI]

+$!$* %##+$!*!%$('+$.%

/ PIPS-PSBB allows to modify the frequency between synchronous communications.

/ Frequency defined with (x,y), where xand yrepresent the minimum and maximum number of B&B iterations that must be processed before communication takes place.

/ Tighter communication increases communication overheads, but reduces work performed.

0 10 20 30 40 50 60 70 80 90

Communicationoverhead(%)

1 2 10 50 100 200 400 Number of Processors 0

10 20 30 40 50 60 70

Scaling:Timetooptimality

1 2 10 50100 200 400 Number of Processors

150000 200000 250000 300000 350000 400000 450000 500000

Totaltreesize

1 2 10 50100 200 400 Number of Processors Tight Communication (10-500) Standard Communication (50-1000) Loose Communication (100-50000)

%",(&(%(#$-&%)))"&

)$(!%)!$(.,(!") %$)*(!$*)

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

0 100 200 300 400 500 600

RootnodeLPRelaxationTime(s)

1[500] 5[100] 10[50] 20[25] 25[20]50[10]100[5]500[1]

Number of solvers[Processors per PIPS-S solver]

0 2e+06 4e+06 6e+06 8e+06 1e+07 1.2e+07

Totaltreesize

1[500 ] 5[100

]

10[50] 20[25] 25[20] 50[10] 100[5] 500[1]

Number of solvers[Processors per PIPS-S solver]

0 5 10 15 20 25

Communicationoverhead(%)

1[500] 5[100] 10[50]

20[25]

25[20]

50[10]

100[5]

500[1]

Number of solvers[Processors per PIPS-S solver]

0 2 4 6 8 10 12 14 16 18

Rampupcommunicationoverhead(%)

1[500] 5[100] 10[50] 20[25] 25[20] 50[10] 100[5] 500[1]

Number of solvers[Processors per PIPS-S solver]

$$'/ '>332C>B=1=@3A7AF / $3@4=@;/<137<1@3/A3AC>B=

1=@3A

$$'$'

/ =;;C<71/B7=<=D3@63/2

;7<7;/:3F13>B/B@/;>C>

E63< $A=:D3@7AA:=E

%#&(!)%$!$)*

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

Instance Scenarios Configuration PIPS-PSBB ug[PIPS-SBB,UG] CPLEX SM CPLEX DM

Solvers PIPS-S procs

GAP(%) (Time)(s) GAP(%) (Time)(s) Procs GAP(%) (Time)(s) Procs GAP(%) (Time)(s)

sslp_5_25_50 50 2 2 (7.45s) (8.03s) 4 (0.27s) 4 (0.27s)

sslp_5_25_100 100 2 2 (22.37s) (17.79s) 4 (0.64s) 4 (0.64s)

sslp_15_45_5 5 200 2 (107.11s) (163.53s) 16 (1.97s) 400 (6.26s)

sslp_15_45_10 10 200 2 0.09% 0.16% 16 (1.81s) 400 (15.04s)

sslp_15_45_15 15 200 2 0.25% 0.30% 16 (7.80s) 400 (15.75s)

sslp_10_50_50 50 200 10 0.13% 0.21% 16 (43.88s) 2000 0.15%(M)

sslp_10_50_100 100 200 10 0.17% 0.20% 16 (221.69s) 2000 0.16%(M)

sslp_10_50_500 500 200 10 0.24% 0.24% 16 4.91%(M) 2000 1.25%(M)

sslp_10_50_1000 1000 200 10 0.24% 0.24% 16 9.91% 2000 6.08%

sslp_10_50_2000 2000 200 10 0.26% 0.26% 16 19.93% 2000 8.11%

Time limit: 1 hour

/ Distributed-memory parallelization of CPLEX is often inferior to its shared-memory counterpart.

/ Both CPLEX versions run into Memory limits for some problems.

/ The superior performance of CPLEX’s base solver helps in trivial and small problems.

/ PIPS-SBB-based solvers show superior performance for large problems.

Performance comparison against CPLEX 12.6.2

UG Synthesizer

)''=:D3@A<!$1=;;C<71/B=@4=@)'7A>@=D7232 C5A

!327/B3 A=:CB7=<

A6/@7<5

$.*>@3AAC5A 1=<475 A=:D3@A

3C@7AB71A=:D3@A can be distributed

/'/13:41.7'34$1.&'&)'

<

$/@/*>@3AAC5A 1=<475

$/@/'$C5A

1=<475

<

1=<475*>@3AAC5A 1=<475C@=07C5A 1=<475$ *C5A

<

>/@/::3:>@=13AA3A

>/@/::3:>@=13AA3A

1=<475C@/B7=<47:3 A1@7>BB=

53<3@/B3@C<B7;3 3<D7@=<;3<B ;/93AG;0=:71:7<9A 53<3@/B3/<;>7@C< 1=;;/<2

4=@/<!$!!$>@=5@/;

&C<B7;3>@=13AA3A

=</AC>3@1=;>CB3@

)''=:D3@A<!$1=;;C<71/B=@4=@)'7A>@=D7232 C5A

!327/B3 A=:CB7=<

A6/@7<5

>

$.*>@3AAAC5A 1=<475 1=<475 A=:D3@A

3C@7AB71A=:D3@A can be distributed

/'/13:41.7'34$1.&'&)'

<

<

$/@/*>@3AA$/@/*>@3AAC5A 1=<475 1=<475

$/@/'$C5A 1=<475

1=<475

< < <

1=<475*>@3AAC5A55

b di t b A

C@=0C5A 5 1=<4755

>>

$$

t ib t dd t ib 07

$ *C5A 5

1=<4755

< <

> >

PAC: L. M unguia, S. Ahmed, D. Bader, G.L. Nemhauser and Y. Shao. ‘‘Alternating Criteria

Conclusions

2D/<13;3<B=4;/B63;/B71/:;=23:=427A/AB3@>@3D3<B7=</<23D/1C/B7=<>:/<<7<5B=E/@2A=17/:7;>:3;3<B/B7=<

We developed a light-weight decentralized distributed memory branch-and-bound implementation for PIPS-SBB with two degrees of parallelism:

Processing of nodes in parallel (parallel LP relaxation, parallel heuristics, parallel problem branching, …).

Branch and Bound in parallel.

Better parallel efficiency is achieved by focusing the parallel resources in the most promising nodes.

We try to reduce communication bottlenecks and achieve high processor occupancy via a decentralized control of the tree exploration and a lightweight mechanism for exchanging Branch and Bound nodes.

/ Superior performance to state-of-the-art commercial MIP solvers, in the context of large instances.

The SCIP Optimization Suite

Matthias Miltenberger

Zuse Institute Berlin

26thFebruary, 2015

Matthias Miltenberger (ZIB): The SCIP Optimization Suite 1

The SCIP Optimization Suite

toolbox forgeneratingandsolvingconstraint integer programs

free for academic use, available in source code

ZIMPL

model and generate LPs, MIPs, and MINLPs